
United States Patent and Trademark Office 


UNITED STATES DEPARTMENT OF COMMERCE 
United States Patent and Trademark Office 
Address: COMMISSIONER FOR PATENTS 
P.O. Box 1450 

Alexandria, Virginia 22313-1450 
www.uspto.gov 


APPLICATION NO. 


FILING DATE 


FIRST NAMED INVENTOR 


ATTORNEY DOCKET NO. 


CONFIRMATION NO. 


10/601,350 


06/23/2003 


7590 06/08/2006 

Ryan, Mason & Lewis, LLP 

90 Forest Avenue 

Locust Valley, NY 11560 


Jonathan H. Connell 


YOR920030166US1 


7454 


EXAMINER 


ARMSTRONG, ANGELA A 


ART UNIT 


PAPER NUMBER 


2626 

DATE MAILED: 06/08/2006 


Please find below and/or attached an Office communication concerning this application or proceeding. 


PTO-90C (Rev. 10/03) 


Office Action Summary 

Application No. 

10/601,350 

Applicant(s) 
CONNELL ETAL 

Examiner 

Angela A. Armstrong 

Art Unit 

2626 



- The MAILING DATE of this communication appears on the cover sheet with the correspondence address - 


Period for Reply 
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Replacement drawing sheet(s) including the correction is required if the drawing(s) is objected to. See 37 CFR 1.121(d). 
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DETAILED ACTION 

Claim Rejections - 35 USC§101 

1. 35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or 
any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and 
requirements of this title. 

2. Claims 1-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed 
to non-statutory subject matter. 

3. Claims 1-22 define non-statutory processed because they merely manipulate an abstract 
idea. The claimed process, a series of steps to be performed by a computer, amounts to a 
manipulation of an abstract idea since the process fails to provide any pre- or post- computer 
process activity. 

Claims 1-9, 19-20 and 22 define non-statutory processed because the claims fail to 
include limitations of functional descriptive material that can impart functionality when 
employed as computer components so as to yield a useful, tangible, concrete result. 

Applicant should note, however, that claims directed to speech or audio signal processing, 
would be considered to be statutory subject matter. For example, the requirement of the 
measurements of physical objects or activities to be transformed outside of the computer into 
computer data (In re Gelnovatch, 595 F.2d 32, 41 n.7, 201 USPQ 136, 145 n.7 (CCPA 1979) 
(data- gathering step did not measure physical phenomenon); Arrhythmia, 958 F.2d at 1056, 22 
USPQ2d at 1036), where the data comprises signals corresponding to physical objects or 
activities external to the computer system, and where the process causes a 
physical transformation of the signals which are intangible representations of the physical objects 
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or activities. Schrader, 22 F.3d at 294, 30 USPQ2d at 1459 citing with approval Arrhythmia, 958 
F.2d at 1058-59, 22 USPQ2d at 1037-38; Abele, 684 F.2d at 909, 214 USPQ at 688; In re Taner, 
681 F.2d 787, 790, 214 USPQ 678, 681 (CCPA 1982). 

Examples of this type of claimed statutory process include the following: 

- A method of using a computer processor to analyze electrical signals and data 
representative of human cardiac activity by converting the signals to time segments, applying the 
time segments in reverse order to a high pass filter means, using the computer processor to 
determine the amplitude of the high pass filter's output, and using the computer processor to 
compare the value to a predetermined value. In this example the data is an intangible 
representation of physical activity, i.e., human cardiac activity. The transformation occurs when 
heart activity is measured and an electrical signal is produced. This process has real world value 
in predicting vulnerability to ventricular tachycardia immediately after a heart attack. 

- A method of using a computer processor to receive data representing Computerized 
Axial Tomography ("CAT") scan images of a patient, performing a calculation to determine the 
difference between a local value at a data point and an average value of the data in a region 
surrounding the point, and displaying the difference as a gray scale for each point in the image, 
and displaying the resulting image. In this example the data is an intangible representation of a 
physical object, i.e., portions of the anatomy of a patient. The transformation occurs when the 
condition of the human body is measured with X-rays and the X-rays are converted into 
electrical digital signals that represent the condition of the human body. The real world value of 
the invention lies in creating a new CAT scan image of body tissue without the presence of 
bones. 
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- A method of using a computer processor to conduct seismic exploration, by 
imparting spherical seismic energy waves into the earth from a seismic source, generating a 
plurality of reflected signals in response to the seismic energy waves at a set of receiver positions 
in an array, and summing the reflection signals to produce a signal simulating the reflection 
response of the earth to the seismic energy. In this example, the electrical signals processed by 
the computer represent reflected seismic energy. The transformation occurs by converting the 
spherical seismic energy waves into electrical signals, which provide a geophysical 
representation of formations below the earth's surface. Geophysical exploration of formations 
below the surface of the earth has real world value. 

Examples of claimed processes that independently limit the claimed invention to safe 
harbor include: 

- a method of conducting seismic exploration which requires generating and 
manipulating signals from seismic energy waves before "summing" the values represented by the 
signals (Taner, 681 F.2d at 788, 214 USPQ at 679); and 

- a method of displaying X-ray attenuation data as a signed gray scale signal in a 
"field" using a particular algorithm, where the antecedent steps require generating the data using 
a particular machine (e.g., a computer tomography scanner). Abele, 684 F.2d at 908, 214 USPQ 
at 687 ("The specification indicates that such attenuation data is available only when an X-ray 
beam is produced by a CAT scanner, passed through an object, and detected upon its exit. Only 
after these steps have been completed is the algorithm performed, and the resultant modified data 
displayed in the required format."). 
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Examples of claimed processes that do not limit the claimed invention to pre- 
compiling safe harbor include: 

- "perturbing" the values of a set of process inputs, where the subject matter 
"perturbed" was a number and the act of "perturbing" consists of substituting the numerical 
values of variables (Gelnovatch, 595 F.2d at 41 n.7, 201 USPQ at 145 n.7 ("Appellants' claimed 
step of perturbing the values of a set of process inputs (step 3), in addition to being a 
mathematical operation, appears to be a data-gathering step of the type we have held insufficient 
to change a nonstatutory method of calculation into a statutory process. ... In this instance, the 
perturbed process inputs are not even measured values of physical phenomena, but are instead 
derived by numerically changing the values in the previous set of process inputs.")); and, 

selecting a set of arbitrary measurement point values (Sarkar, 588 F.2d at 1331, 200 
USPQ at 135). If a claim does not clearly fall into one or both of the safe harbors, the claim may 
still be statutory if it is limited to a practical application in the technological arts. 

Claim Rejections - 35 USC §103 
The text of those sections of Title 35, U.S. Code not included in this action can be found 
in a prior Office action. 

4. Claims 1-22 are rejected under 35 U.S.C. 103(a) as being unpatentable over Garg et al, 
"Frame-dependent multi-stream reliability indicators for audio-visual speech recognition," 
Proceedings of International Conference on Acoustics, Speech and Signal Processing, ICASSP 
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2003, vol. 1, April 2003, pages 24-27 in view of Masai et al (US Patent Application Publication 
2003/0177005). 

5. Regarding claim 1, Garg teaches a method for audio-visual speech recognition 
comprising: providing an acoustic-only data model and an acoustic-visual data model (pages 
24-26; section 2, entitled "The Multi-Stream HMM"; section 3, entitled "Stream Reliability 
Indicators"; section 4, entitled "Reliability Based Stream Exponents."); and decoding at least a 
portion of an input spoken utterance using selected data models (pages 24-26; section 2, entitled 
"The Multi-Stream HMM"; section 3, entitled "Stream Reliability Indicators"; section 4, 
entitled "Reliability Based Stream Exponents"; Tables 1-2). Garg does not specifically teach a 
data model is selected based on a condition associated with the environment of the speaker. 
However, selecting an optimum data model for performing recognition based on environmental 
conditions so as to improve recognition accuracy and performance was well known in the art of 
speech recognition. Masai discloses (paragraph 75) a method and device for producing acoustic 
models for recognition and specifically teaches the speech recognition unit recognizes the 
speech data and convert them into text data in accordance with the environment information of 
the time when the speech data are uttered, the acoustic model for recognition selection unit 
selects the acoustic model for recognition according to the environment information and 
converts the speech data into text data by using the selected acoustic model for recognition. 

It would have been obvious to one of ordinary skill at the time of the invention to modify 
the system of Garg to allow for the selection of the most optimum data model, as suggested by 
Masai, for the purpose of improving recognition accuracy and performance of the speech 
recognizer, as was well known in the art. 
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Regarding claim 2 5 Garg and Masai teach storing the acoustic-only data model and the 
acoustic- visual data model in memory such that model selection is made by shifting one or more 
pointers to one or more memory locations where the selected model is located (Page 26-27, 
section 5, "Database and Experiments"). 

Regarding claim 3, Garg and Masai teach model selection is based on a likelihood ratio 
test (pages 24-26; section 2, entitled "The Multi-Stream HMM"; section 3, entitled "Stream 
Reliability Indicators"; section 4, entitled "Reliability Based Stream Exponents"). 

Regarding claim 4, Garg and Masai teach model selection comprises selecting the 
acoustic-only data model when a result of the likelihood test is not greater than a threshold value 
(pages 24-26; section 2, entitled "The Multi-Stream HMM"; section 3, entitled "Stream 
Reliability Indicators"; section 4, entitled "Reliability Based Stream Exponents"). 

Regarding claim 5, Garg and Masai teach the model selection step comprises selecting 
the acoustic-visual data mode when a result of the likelihood test is not less than a threshold 
(pages 24-26; section 2, entitled "The Multi-Stream HMM"; section 3, entitled "Stream 
Reliability Indicators"; section 4, entitled "Reliability Based Stream Exponents"). 

Regarding claim 6, Garg and Masai teach the threshold value is based on a cost 
associated with a recognition error (Tables 1 and 2; section 3, "Stream Reliability Indicators). 

Regarding claim 7, Garg and Masai teach the likelihood ratio test is based on one or more 
observations of a given visual feature (Tables 1 and 2; section 3, "Stream Reliability Indicators). 

Regarding claim 8, Garg and Masai teach the given visual feature is associated with the 
mouth region of a speaker of the input utterance (Page 26-27, section 5, "Database and 
Experiments"). 
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Regarding claim 9, Garg and Masai teach the model selection is performed at a rate 
substantially equivalent to an observation rate associated with the audio-visual speech 
recognition system (Page 26-27, section 5, "Database and Experiments"). 
6. Regarding claims 10-22; claims 10-22 are similar in scope and content to method claims 
1-9 and are therefore rejected under similar rationale. 


Response to Arguments 
7. Applicant's arguments filed March 30, 2006, have been fully considered but they are not 
persuasive. Applicant argues Garg fails to disclose selecting between an acoustic-only data 
model and an acoustic- visual data model based on a condition associated with a visual 
environment, and decoding at least a portion of an input spoken utterance using the selected data 
model and that Masai contains no disclosure relating to a selection between an acoustic-only 
model and an acoustic-visual model. Applicant further argues neither Garg nor Masai 
individually teach or suggest the limitations of the independent claims and therefore the 
combination of Garg and Masai also fails to teach or suggest the limitations of the independent 
claims. 

In response to applicant's arguments against the references individually, one cannot show 
nonobviousness by attacking references individually where the rejections are based on 
combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re 
Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). In this instance, Garg was cited 
for teaching a method for audio-visual speech recognition implementing an acoustic-only data 
model and an acoustic-visual data model. While, Garg does not specifically teach a data model 
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is selected based on a condition associated with the environment of the speaker, it was well 
known in the art to provide a means for selecting an optimum data model for performing 
recognition based on environmental conditions so as to improve recognition accuracy and 
performance. Masai was cited for teaching this optimum data model selection. Masai discloses 
a method and device for producing acoustic models for recognition and specifically teaches the 
speech recognition unit recognizes the speech data and convert them into text data in 
accordance with the environment information of the time when the speech data are uttered, the 
acoustic model for recognition selection unit selects the acoustic model for recognition 
according to the environment information and converts the speech data into text data by using 
the selected acoustic model for recognition. Thus, the combination of Garg and Masai would 
provide for a speech recognition system, which utilizes acoustic-only data models and acoustic- 
visual data models (as provided by Garg), such that the most optimum sets of acoustic only 
and/or acoustic-visual data models are selected and used for recognition as determined by 
environment information of the time when the speech data is received (as provided by Masai). 


8. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Angela A. Armstrong whose telephone number is 571-272-7598. 
The examiner can normally be reached on Monday-Thursday 1 1 :30-8:00 PM. 


Application/Control Number: 10/601,350 


Page 10 


Art Unit: 2626 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on 571-272-7843. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would 
like assistance from a USPTO Customer Service Representative or access to the automated 
information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



Angela A Armstrong 
Primary Examiner 
Art Unit 2626 


AAA 

June 6, 2006 


